Eszter Friedman,
MTA SZTAKI, feszter@info.ilab.sztaki.hu
Julianna Göbölös-Szabó, MTA
SZTAKI, gobolos.szabo.julianna@gmail.com
Adrien Szabó, MTA SZTAKI,
adrienn.szabo4@gmail.com, [PRIMARY
contact]
András Lukács, MTA SZTAKI,
alukacs@sztaki.hu
The Epidemic Outbreak Visualizer
is able to plot the changes of the amount of diseased people in time even
filtered by special features like symptoms, age or gender. The tool is able to
record unexpected or unusual events or trends which can help in recongnizing a new epidemic outbreak. The EOV can
provide help in comparing outbreaks in different cities. The tool was
implemented for this contest by Adrienn Szabó in a short week.
Video:
MC2.1: Analyze the records you have been given to
characterize the spread of the disease. You should take into
consideration symptoms of the disease, mortality rates, temporal
patterns of the onset, peak and recovery of the disease. Health officials
hope that whatever tools are developed to analyze this data might be available
for the next epidemic outbreak. They are looking for visualization tools
that will save them analysis time so they can react quickly.
First we
transformed hospital records into simple time series data. First, we only
considered frequent symptoms (as abdominal pain, back pain, diarrhea, head
ache, head bleeding, cough, fever, rash, etc.), genders and three clusters of
ages (under 20 years, between 20-59 years and over 60). Next, we aggregated
these and all pairs of these into daily data in which we counted the number of
people with these features on a given day and we wanted to observe the changes
of these amounts in time. (Generating all these input files took about 1 hour
on a strong commodity PC.)
In order to
be able to compare the changes of different symptoms, we wanted to monitor how
amounts differ from the number of patients of a normal, “epidemic-free” period.
Therefore we computed the mean and variance of the number of diseased people in
the first 5 and the last 20 days of the time series data and we created an
index to each day showing how great the difference of the patient number of the
given day is from the expected value. If this index has a large positive value
on several consecutive days it implies that probably an epidemic is breaking
out, thus the observation of this value can help health officials to react as
soon as possible. It is worth noting that this index value can have an
accidental jump without an epidemic, because the number of patients can be seen
as a random variable, but it is very unlikely that the value of the index
remains high for more than 2-3 consecutive days without an epidemic. The tool
is also able to sign the unusual growth of the number of deceased people even
filtered by symptoms or by other features as mentioned above.
These time
series are plotted in a table and the index of “probability” is marked with
different colors as in a heat map. If more people than expected are admitted to
a hospital then the symptom gets a more reddish color, if it is around the
expected daily value, it is shown in green, and blue means a patient number
under expectations. The visualizer is able to
highlight cases when the number of deaths related to some of the considered
symptoms is above a threshold (with a thicker border around the cell). We have
to add that in some cases the sample we used to compute the mean and variance
of the normal period was so little that any little growth in the number of
patients could cause a warning (a red cell). To avoid such
situations we down-scaled these noisy data. In order to check a
suspicious symptom or symptom-pair, the user can click with the left mouse
button on any cell of the table to see a chart about the number of admitted
patients that have the selected symptom(-pair ) in the whole time period.
Clicking with the right button brings up a similar chart about deceased
patients. This can help judging the severity of the situation and gives a more
complex picture of it.
On the
left there is the heat map of
With this
tool we are able to recognize the epidemic in
MC2.2: Compare the outbreak across
cities. Factors to consider include timing of outbreaks, numbers of
people infected and recovery ability of the individual cities. Identify
any anomalies you found.
Having
analyzed the hospital records we found that there was no epidemic outbreak in
·
·
·
·
·
·
·
·
·
We can
recognize that the countries can be sorted into groups
regarding symptoms:
1. In
2. In
3. The epidemic is the
most severe in
The order
of the outbreaks is the following:
|
Time of Outbreak |
Peak |
Recovery |
Duration |
Nairobi |
04.28.09 |
Between
05.10 and 05.20 |
05.28.09 |
30 days |
Karachi |
04.30.09 |
Between
05.11 and 05.20 |
06.06.09 |
40 days |
Iran |
05.01.09 |
Between
05.16 and 05.23 |
06.06.09 |
37 days |
Aleppo |
05.02.09 |
Between
05.10 and 05.20 |
05.30.09 |
29 days |
Venezuela |
05.02.09 |
Between
05.15 and 05.23 |
05.30.09 |
28 days |
Lebanon |
05.03.09 |
Between
05.14 and 05.21 |
05.30.09 |
27 days |
Saudi Arabia |
05.04.09 |
Between
05.15 and 05.22 |
06.01.09 |
27 days |
Yemen |
05.04.09 |
Between
05.12 and 05.21 |
06.03.09 |
29 days |
Colombia |
05.05.09 |
Between
05.16 and 05.25 |
06.01.10 |
25 days |
The amount
of deceased people seems to correlate with the intensity of the epidemic. In
This
screenshot shows the number of dead patients in